AITopics | ridge regression

Collaborating Authors

ridge regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

Bowyer, Sam, Locatelli, Acyr, Cao, Kris

arXiv.org Machine LearningMay-26-2026

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression with feature selection, we find that existing efficient benchmarking methods can be greatly improved by simply using kernel ridge regression at the prediction stage. Additionally, using an information-theoretic feature-selection algorithm called minimum redundancy maximum relevance (mRMR), we can further improve upon these methods by selecting question subsets that will be maximally useful for prediction. Except in very data-poor settings, these approaches consistently achieve smaller prediction errors (in both MAE and RMSE), and greater ranking correlation between predicted and true scores (in both Spearman $ρ$ and Kendall $τ$) across a range of benchmarks using both binary and continuous metrics. Furthermore, mRMR subsampling is much faster than competitor methods (which often involve fitting probabilistic models or running clustering algorithms), and is more likely to select the same questions under different random seeds or training data splits. Tutorial code can be found at https://github.com/sambowyer/mrmr_eval .

artificial intelligence, coreset, machine learning, (13 more...)

arXiv.org Machine Learning

2605.25773

Country:

Europe (1.00)
North America > United States > California (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

Lecoiu, Radu, Mukherjee, Debarghya, Sur, Pragya

arXiv.org Machine LearningMay-19-2026

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$-step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s-k)$-step distilled estimator is strictly suboptimal for $1 \leq k \leq s$. For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers share spectral shrinkage estimators and a common server seeks to aggregate them to achieve optimal performance. In this case, we find that the best local rule again takes the form of self-distillation, though it differs from the optimal rule when data are hosted centrally on a single server. Together, our results elucidate why self-distillation improves predictive performance and provide a broader statistical framework connecting it with classical shrinkage-based methods.

artificial intelligence, machine learning, theorem 3, (19 more...)

arXiv.org Machine Learning

2605.17778

Country: North America > United States > New York (0.27)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Neural Information Processing SystemsApr-29-2026, 11:39:41 GMT

We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression and various implementations of gradient descent.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

Lu, Weihao, Lin, Qian, Xia, Yingcun, Huang, Dongming

arXiv.org Machine LearningApr-28-2026

Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the largedimensional setting where the sample size and dimension are of comparable order, i.e., n dγ for some γ > 0. We first consider inner-product kernels on the sphere Sd 1 and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions s 0, where smeasures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever sis positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in Rd whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions. Keywords: Spectral algorithms, learning curves, high dimension, benign overfitting. 1 Introduction Nonparametric regression studies the estimation of an unknown function f: Rd R from ni.i.d.

artificial intelligence, learning curve, machine learning, (18 more...)

arXiv.org Machine Learning

2604.23212

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Can we globally optimize cross validation loss in ridge regression

Neural Information Processing SystemsApr-27-2026, 03:01:02 GMT

Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Crossvalidation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors, and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.82)

Add feedback

543bec10c8325987595fcdc492a525f4-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 22:47:00 GMT

artificial intelligence, machine learning, regime, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

477bdb55b231264bb53a7942fd84254d-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 16:55:31 GMT

artificial intelligence, machine learning, regression, (19 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre: Research Report > New Finding (0.47)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions

Neural Information Processing SystemsApr-25-2026, 09:56:48 GMT

We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.

artificial intelligence, estimator, machine learning, (15 more...)

Neural Information Processing Systems

Technology: